BigDataScript: a scripting language for data pipelines

نویسندگان

  • Pablo Cingolani
  • Rob Sladek
  • Mathieu Blanchette
چکیده

MOTIVATION The analysis of large biological datasets often requires complex processing pipelines that run for a long time on large computational infrastructures. We designed and implemented a simple script-like programming language with a clean and minimalist syntax to develop and manage pipeline execution and provide robustness to various types of software and hardware failures as well as portability. RESULTS We introduce the BigDataScript (BDS) programming language for data processing pipelines, which improves abstraction from hardware resources and assists with robustness. Hardware abstraction allows BDS pipelines to run without modification on a wide range of computer architectures, from a small laptop to multi-core servers, server farms, clusters and clouds. BDS achieves robustness by incorporating the concepts of absolute serialization and lazy processing, thus allowing pipelines to recover from errors. By abstracting pipeline concepts at programming language level, BDS simplifies implementation, execution and management of complex bioinformatics pipelines, resulting in reduced development and debugging cycles as well as cleaner code. AVAILABILITY AND IMPLEMENTATION BigDataScript is available under open-source license at http://pcingola.github.io/BigDataScript.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Bluima: a UIMA-based NLP Toolkit for Neuroscience

This paper describes Bluima, a natural language processing (NLP) pipeline focusing on the extraction of neuroscientific content and based on the UIMA framework. Bluima builds upon models from biomedical NLP (BioNLP) like specialized tokenizers and lemmatizers. It adds further models and tools specific to neuroscience (e.g. named entity recognizer for neuron or brain region mentions) and provide...

متن کامل

StarFlow: A Script-Centric Data Analysis Environment

We introduce StarFlow, a script-centric environment for data analysis. StarFlow has four main features: (1) extraction of control and data-flow dependencies through a novel combination of static analysis, dynamic runtime analysis, and user annotations, (2) command-line tools for exploring and propagating changes through the resulting dependency network, (3) support for workflow abstractions ena...

متن کامل

KEGGexpressionMapper allows for analysis of pathways over multiple conditions by integrating transcriptomics and proteomics measurements

Motivation: In transcriptomic and proteomics-based studies, the abundance of genes is often compared to functional pathways such as the Kyoto Encyclopaedia at Genes and Genomes (KEGG) to identify active metabolic processes. Even though a plethora of tools allow to analyze and to compare omics data in respect to KEGG pathways, the analysis of multiple conditions is often limited to only a define...

متن کامل

Proteomics to go: Proteomatic enables the user-friendly creation of versatile MS/MS data evaluation workflows

UNLABELLED We present Proteomatic, an operating system independent and user-friendly platform that enables the construction and execution of MS/MS data evaluation pipelines using free and commercial software. Required external programs such as for peptide identification are downloaded automatically in the case of free software. Due to a strict separation of functionality and presentation, and s...

متن کامل

A Binary Data Stream Scripting Language

Any file is fundamentally a binary data stream. A practical solution was achieved to interpret binary data stream. A new scripting language named Data Format Scripting Language (DFSL) was developed to describe the physical layout of the data in a structural, more intelligible way. On the basis of the solution, a generic software application was implemented; it parses various binary data streams...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره 31  شماره 

صفحات  -

تاریخ انتشار 2015